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Abstract 

A language L is prefix-free if, whenever words u and v are in L and u 
is a prefix of v, then u = v. Suffix-, factor-, and subword-free languages 
are defined similarly, where "subword" means "subsequence". A language 
is bifix-free if it is both prefix- and suffix-free. We study the quotient com- 
plexity, more commonly known as state complexity, of operations in the 
classes of bifix-, factor-, and subword-free regular languages. We find tight 
upper bounds on the quotient complexity of intersection, union, difference, 
symmetric difference, concatenation, star, and reversal in these three classes 
of languages. 

1 Introduction 

The state complexity of a regular language L is the number of states in the minimal 
deterministic finite automaton (dfa) accepting L |[26l . This complexity is the same 
as the quotient complexity J5j of L, which is the number of distinct left quotients 
of L. We prefer quotient complexity since it is more closely related to properties 
of languages. The quotient complexity of an operation in a class C of regular 
languages is the worst-case quotient complexity of the language resulting from 
the operation, taken as a function of the quotient complexities of the operands in 
class C. For surveys on state and quotient complexity see Il5l l26ll . 

*This work was supported by the Natural Sciences and Engineering Research Council of Canada 
under grant no. OGP0000871 and by the Slovak Research and Development Agency under contract 
APVV-0035-10 "Algorithms, Automata, and Discrete Data Structures". 
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One of the first results concerning the state complexity of an operation is the 
1966 theorem by Mirkin Q8l , who showed that the bound 2™ for the reversal of an 
re-state dfa can be attained. In 1970 Maslov [17) stated without proof the bounds 
on the complexities of union, concatenation, star, and several other operations in 
the class of regular languages, and gave languages meeting these bounds. In 1994 
these operations, along with intersection, reversal, and left and right quotients, 
were studied in detail by Yu, Zhuang and Salomaa |[27l . 

Results exist also for proper subclasses of the class of regular languages: 
unary EjllZZl, finite EUSlEgl, cofinite 0, prefix-free 02 113, suffix-free (H 
[111 El, ideal O, and closed Q. The bounds can vary considerably. 

Free languages (with the exception of {e}, where e is the empty word) are 
codes, which constitute an important class of languages and have applications 
in such areas as cryptography, data compression, and information transmission. 
They have been studied extensively; see, for example, [3] [15]]. In particular, prefix 
and suffix codes (3j are prefix-free and suffix-free languages, respectively, infix 
codes II2T1 l22l are factor-free, and hypercodes IT2T1 l22l are subword-free, where 
by subword we mean subsequence. Moreover, free languages are special cases of 
convex languages 011231. We are interested only in regular free languages. 

The state complexities of intersection, union, concatenation, star, and reversal 
were first studied by Han, K. Salomaa, and Wood lPT2l for prefix-free languages, 
and by Han and K. Salomaa iPTll for suffix-free languages. In the present paper, 
these results are extended to bifix-, factor- and subword-free languages. In par- 
ticular, we obtain tight upper bounds on the complexities of intersection, union, 
difference, symmetric difference, star, concatenation, and reversal in these three 
classes of free languages. 

2 Preliminaries 

It is assumed that the reader is familiar with finite automata and regular languages 
as treated in 0911251 . for example. If £ is a finite non-empty alphabet, then E* is 
the set of all words over this alphabet, with e as the empty word. For w £ £*, let 
I to I be the length of to. A language is any subset of E*. 

The following set operations are defined on languages: complement (L = 
E* \ L), union (K U L), intersection (K D L), difference (K \ L), and symmetric 
difference (K © L). A general boolean operation with two arguments is denoted 
by K o L. 

We also define the product, usually called concatenation or catenation, (KL = 
{w € E* I w = uv, u € K, v G L}), (Kleene) star (L* = \J i>0 D with 
L° = {e}), and positive closure (L + = [j i>1 L % ). 
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The reverse w of a word w G £* is defined inductively as follows: e = e, 
and (wa) R = aw R for every symbol a in £ and every word w in £*. The reverse 
of a language L is denoted by L R and is denned as L R = {w R \ w G L}. 

Regular languages over £ are languages that can be obtained from the sef of 
basic languages {0, {e}} U {{a} \ a G £}, using a finite number of operations of 
union, product, and star. We use regular expressions to represent languages. If E 
is a regular expression, then C(E) is the language denoted by that expression. For 
example, the regular expression E = (e U a)*b denotes language L = C(E) = 
({e} U {a})* {&}. We usually do not distinguish notationally between regular 
languages and regular expressions. 

Whenever convenient, we derive upper bounds on the state complexity of op- 
erations on free languages following the approach of (51. A quotient of a language 
L by a word w is defined as L w = {x G £* | wx G L}. The number of distinct 
quotients of a language is called its quotient complexity and is denoted by k(L). 

Quotients of regular languages flU HI can be computed as follows: First, the 
e-function L e of a regular language L is L e = if e g" L, and L e = e if e G L. 
The quotient by a letter a in £ is computed by induction: b a = if b G {0, e} or 
b G Sand 6 / a, and 6 a = eif b = a; (L) a = L a ; {KoL) a = K a oL a ; (KL) a = 
K a L U K £ L a ; (L*) a = L a L*. The quotient by a word w in S* is computed by 
induction on the length of w: L £ = L and L wa = (L w ) a . A quotient L w is 
accepting if e G L w ; otherwise it is rejecting. 

A deterministic finite automaton (dfa) is a quintuple P = (Q, S, <5, -^)> 
where Q is a finite set of states, £ is a finite alphabet, 5 : Q x S — > Q is 
the transition function, qo is the initial state, and F C Q is the set of ^ma/ or 
accepting states. As usual, the transition function is extended to Q x E*. The dfa 
P accepts a word ui in S* if 5(qo,w) G F. The set of all words accepted by V is 
L(V). By the language of a state q of V we mean the language L 9 accepted by 
the automaton (Q, S, <5, F). A state is empty if its language is empty. 

The quotient automaton of a regular language L is the dfa V = (Q, E, <5, qo,F), 
where Q = {L ra | u; G £*}, 6(L w ,a) = L wa , q = L £ , F = {L w \ e G L w }. 
This is the minimal dfa accepting L. Hence the quotient complexity of L is equal 
to the state complexity of L, and we call it simply complexity. 

3 Free Languages 

If u, v,w,x G E* and w = uxv, then u is a prefix of w, x is a factor of w, and 
v is a sM^?x of w. Both u and v are also factors of w. If u> = uof iiti • • • v n u n , 
where Ui,V{ G £*, then v = v\V2 ■ ■ ■ v n is a subword of u>. Every factor of w is 
also a subword of ui. 
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A language L is prefix-free (respectively, suffix-, factor-, or subword-free) if, 
whenever words u and v are in L and u is a prefix (respectively, suffix, factor, or 
subword) of v, then u = v. Additionally, L is bifix-free if it is both prefix and 
suffix-free. All subword-free languages are factor-free, and all factor-free lan- 
guages are bifix-free. For convenience, we refer to prefix-, suffix-, bifix-, factor-, 
and subword-free languages together as free languages. 

If e is a quotient of L, then L also has the empty quotient, since e a = 0, for 
all a in E. We say that a quotient L w is uniquely reachable if L w = L x implies 
that w = x. We now restate two propositions from ifTTl [121 in our terminology. 

Proposition 1. A non-empty language is prefix-free if and only if it has exactly 
one accepting quotient and that quotient is e. 

Proposition 2. The quotient by e of a non-empty suffix-free language is uniquely 
reachable, and the language has the empty quotient. 

Let L be any language. If (L u ) x = L v for some words u, v and a non-empty 
word x, then L v is positively reachable from L u , and we denote this by L u — > L v . 
The relation — > is transitive. The next proposition uses this relation to characterize 
finite languages. 

Proposition 3. If L is any language with the set of quotients {L±, L 2 , ■ ■ ■ , L n }, 
and u,v £ £*, then the following are equivalent: 

1. L is finite. 

2. L u — > L v and L v — > L u if and only if L u = L v = 0. 

3. There exists a total order ^ on the set of quotients: 

L = Li<L 2 <-- - < L n _i <L n = 
which satisfies the condition that (Li) a = Lj implies Li -< Lj or Li = Lj = L n . 

Proof. Suppose L is a finite language. If L u — > L v and L v — > L u , then (L u ) x = 
L v and (L v ) y = L u , for some words x and y. If also L u ^ 0, then u(xy) k w € L 
for every nonnegative k and any word w in L u , which contradicts that L is finite. 
Note also that L u ^ if L v ^ 0. If L u = L v = 0, then (L u ) a = L u for every 
a in S, and we have L u — > L u . Thus (1) implies (2). 

Now suppose that L is infinite and k(L) = n. Then there is a word uxv in L 
of length at least n such that L u = L ux and x € S + . Thus L u — > L u and L u ^ 0, 
showing that (2) cannot hold. Hence (2) implies (1). 

If (1) holds, we can take the reflexive closure — y' of the relation — >. Then the 
relation — >' is a partial order, and we can use any total order < consistent with 
relation — / to get (3). Conversely, if (3) holds, then L cannot be infinite, by the 
same argument as was used to prove that (2) implies (1). □ 
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Since every subword-free language is finite, we get the next lemma, which we 
use later to prove that upper bounds on the quotient complexity of some operations 
on subword-free languages cannot be reached if the alphabet of the language does 
not have sufficiently many letters. 

Lemma 1. Let L be a subword-free language with k(L) = n, where n > 4. Let 
the distinct quotients L = L £ = L\, L2, • • • , £n-2j L n _ 1 = e, L n = of L be 
ordered as in Proposition^ If L w = L2for some word w, then \w\ = 1. 

Proof. Since n > 4, the quotients L and L2 are not empty. Let v be a word in L2. 
If L w = L2, then w cannot be e because L2 7^ L\. If \w\ > 1, then w = ua for 
a letter a and a non-empty word u. Then L u ^ L since L is uniquely reachable. 
If L u = L2, then uv € L and uav G L, and language L is not subword-free. 
Thus, if L u = Li, for some i, then i > 2, L w = (L u ) a = (Lj) a = Lj where 
j > i > 2, contradicting that L„, = L2. Thus u; must be a one-letter word. □ 

Finally, we describe a simple method of constructing free languages. 

Proposition 4. Let L C S* &e a«j language, and let a £ E. 77je7i fi J aL w 
suffix-free, (2) La is prefix-free, (3) aLa is factor-free. 

Proof. (1) Every proper suffix of a word in aL is a word over the alphabet S, and 
so is not in aL. Therefore aL is suffix-free. 

(2) The proof is dual to that of (1). 

(3) Every proper factor of a word in aLa contains at most one a and therefore is 
not in aLa. □ 

4 Boolean Operations 

The complexity of boolean operations, in the class of prefix- and suffix-free reg- 
ular languages, except for the difference and symmetric difference of suffix-free 
languages, was studied in ifTTl [T2l [T3l IT4Tl . It was shown that for prefix-free lan- 
guages, the tight bounds for union, intersection, difference, and symmetric dif- 
ference are mn — 2, ran — 2(m + n — 3), mn — (m + 2n — 4), and mn — 2, 
respectively. For union and intersection of suffix-free languages, the tight bounds 
are mn — (m + n — 2) and mn — 2(m + n — 3), respectively. The bounds for dif- 
ference and symmetric difference are mn — (m + 2n — 4) and mn — (m + n — 2), 
respectively, and the bounds for all four boolean operations are met by binary 
suffix-free languages |9j. The next two theorems provide results for boolean op- 
erations on bifix-, factor-, and subword-free languages. 



5 



Theorem 1 (Boolean Operations: Bifix- and Factor-Free Languages). Let K 

and L be bifix- or factor-free languages over an alphabet S with k(K) = m and 
k{L) = n, where m,n > 4. Then 

1. k{K n L) < mn - 3(m + n — 4); 

2. \L)< mn - (2m + 3n - 9); 

3. k(K U L), k(K ®L)<mn-(m + n). 
All the bounds are tight if\Tl\ > 3. 

Proof. Since if and L are bifix-free, by unique reachability we get a reduction of 
m + n — 2 from the general bound mn. Moreover, both languages K and L have 
e and as quotients. For intersection, we have n L w = K w (10 = 0, and the 
quotients e n L w and if^ n e are either empty or equal to e. This gives the upper 
bound. For difference, we eliminate m + n — 2 quotients by unique reachability, 
n — 2 quotients by the fact that \ L w = (keeping only one representative 
\ 0), m — 2 quotients by the fact that K w \ = K w \ e (keeping K w \ as 
a representative), and n — 3 more quotients by the rule e \ L w = e, for a total 
reduction of (2m + 3n — 9). For union, we have the unique reachability reduction 
of m+n—2, and a further reduction of 2 by the rule eUe = eU0 = 0Ue = e. For 
symmetric difference, we note that e©e = 0©0 = and e©0 = 0©e = e. 

For tightness, consider K = a(c*(a U b)) m ~ 3 , L = a(b*(a U c)) n ~ 3 ; see 
Figure Q] If w € K , then w = av for some word v containing m — 3 occurences 
of symbols from {a, 6} and ending in a or 6. This means that no proper factor of 
w is in K, and so K is factor-free. A similar proof applies to L. 

In the cross-product automaton of Figure [2] for the boolean operations on lan- 
guages K and L, all the states are reached from the initial state (1, 1) by a word 
in ab*c* U ac*b*, except for state (m — l,n — 1) which is reached from state 
(m — 2, n — 2) by a. 

For intersection, the only accepting state is (m — 1, n — 1). All the rejecting 
states in rows m — 1 and m and columns n — 1 and n are empty. The word a 




Figure 1 : Factor-free languages meeting the upper bounds for boolean operations. 



6 



b b b 



Figure 2: Cross product automaton for boolean operations on factor-free lan- 
guages from Figure [Q m = 5, n = 6. 

is accepted only from (m — 2,n — 2), word b m ~ 2 ~ l c n ~ 2 ~^ a (2 < i < m — 2, 
2 < j < n — 2) only from state and the word ab n ~ 4 c n ~ 4 a only from state 
(1,1). This gives mn — 3(m + n — 4) reachable and pairwise distinguishable states. 

For difference, all the states of the cross-product automaton in row m — 1, 
except for (m — 1, n — 1), are accepting and accept e. All the states in row m, 
as well as state (m — 1, n — 1) are empty. Moreover, states (i, n — 1) and (i, n) 
with 2 < z < m — 2 are equivalent. The word ab m is accepted only from (1,1). 
Now let and (fe, £), where 2 < i < n — 1,2 < j < m — 2, be two distinct 
states. If 2 < fe, then c n ^ m ~!^ 1 i s accepted from (i , j) but not from (fc, ^). If i = k 
and j < I, then b m ~ 2 ~' l c n ~ 2 ~^ a is not accepted from (z, j) but is accepted from 
(k, £). This means that mn — (2m + 3n — 9) states are pairwise distinguishable. 

For union, all the states in row m — 1 and in column n — 1 are accepting, 
and moreover, the three states (m, n — 1), (m — l,n — 1), and (m — 1, n) are 
equivalent. The word ab m ~ 3 is accepted only from (1, 1). Consider two distinct 
rejecting states and (k,£). If % < k, then c"^" 1 - 1 -* i s accepted from 
but not from (fc,^). If j < £, then b m c n ~ l ~i is accepted from (z, j) but not from 
(fe,£). Now consider two distinct accepting states different from (m,n — 1) and 
(m — 1, n). By c, the two states either go to two states one of which is accepting 
and the other rejecting, or to two distinct rejecting, and hence distinguishable, 
states. This proves distinguishability of mn — (m + n) states. 

The proof for symmetric difference is the same as for union, except that state 
(m — 1, n — 1) is empty and states (m, n — 1) and (m — 1, n) are equivalent. □ 



7 



L 




Figure 3: Binary factor-free witnesses for intersection and difference. Missing 
transitions in the automaton accepting K (L) all go to the empty state m (n). 

The next result shows that the upper bounds for intersection and difference of 
factor-free languages are also tight in the binary case. 

Proposition 5 (Intersection and Difference: Binary Factor-Free Languages). 

There exist binary factor-free languages K and L with k(K) = m and k(L) = n, 
where m,n > 6, such that 

1. k(K n L) > mn — 3(m + n — 4) and 

2. k(K \L)>mn- (2m + 3n - 9). 

Proof Let K and L be the binary factor-free languages accepted by the quotient 
automata of Figure [3] 

In the corresponding cross-product automaton of Figure 01 except for (1,1), 
no states in row 1 or column 1 are reachable. Also, states (m — 1,2) and (m, 2) 
are unreachable, as are the states in column n — 1 , except (3, n — 1) , (m — 1 , n — 1 ) , 
and (m, n — 1). The remaining states are all reachable. 

For intersection, the only accepting state is (m — 1, n — 1), and all the other 
states in the last two rows and columns are empty. We will prove that states (1,1), 
with 2 < i < m — 2 and 2 < j < n — 2, (m — 1, n — 1), and (m,n), 
which represents all the empty states, are all distinguishable. Then it follows that 
k(K n L) > (m - 3)(n - 3) + 3 = mn - 3(m + n - 4). 

State (m, n) is the only empty state in our set. We show that for each other 
rejecting state (i, j), there exists a word Wij that is accepted only from state 
We have w m _2.n-2 = a because word a is accepted only from state (m— 2, n — 2). 
Since only one transition on letter 6 goes to state (m — 2, n — 2), and it goes from 
state (m — 3, n — 2), the word ba is accepted only from state (m — 3, n — 2). 
Therefore w m -z, n -2 = ba = bw m -2, n -2- For similar reasons we have 
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Figure 4: Cross-product automaton for m = 6, n = 7. Missing transitions all go 
to state (7,6). 
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which proves that mn — 3(m + n — 4) states are pairwise distinguishable. 

In the case of difference, all the states in row m, as well as state (m — 1, n — 1) 
are empty. All the other states in row m — 1 accept e, and so are equivalent. For 
each i with 2 < i < m — 2, states (i, n — 1) and (z, n) are equivalent. Among the 
other reachable states consider two distinct states p and q. If they are in different 
rows, then by a word in b* we can send p to a state p' in row 3, and q to a state 
q' that is not in row 3. Now by a n , state q' goes to the empty state, while p' 
goes to state (3, n) that is not empty. Two distinct states in the same row go by 
a word in b* to row 3. Then, by a word in o*, the first goes to state (3, n — 2) 
while the second to (3, n), and now fe m ~ 2 ~ 3 a distinguishes them. In summary, 
k(K \L)>(m- 3)(n -3) + m- 3 + 3 = mn- (2m + 3n - 9). □ 
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Figure 5: Binary bifix-free languages meeting the bound mn — (m + n) — 2 for 
union and symmetric difference. 

The next proposition gives lower bounds for union and symmetric difference 
of binary bifix-free languages. 

Proposition 6 (Union, Symmetric Difference: Binary Bifix-Free Languages; 
Lower Bound). Let m,n > 6. There exist binary bifix-free languages K and L 
with k(K) = mandn{L) = n such that k(K\JL), k(K®L) >mn—(m+n) — 2. 

Proof. Consider the binary languages 

K = a{(ba*) m ~ b b\Ja){b({ba*) m ~H\Ja)Ya, 
L = a{a\Jb) n - A {b{a\Jb) n - 4 )*a. 

Quotient automata for m = 7 and n = 6 are shown in Figure [5] Since both 
languages have e as the only accepting quotient, they are prefix-free. Since the 
reverse automata are deterministic, the reversed languages also have e as the only 
accepting quotient, and so are prefix-free. Thus both languages are bifix-free. 

The cross-product automaton is shown in Figure [6] States in row 1 and col- 
umn 1 are unreachable, with the exception of the initial state (1,1). Also, states 
(2,n — 1) and (m — 1,2) are unreachable. The initial state (1, 1) goes to state 
(2, 2) by a and then to state (3, 3) by b. From (3, 3), all the other states in row 3, 
except for (3, 2) are reached by a-transitions. Next, state (3, n — 2) goes to state 
(4, 2) by b, and then to (4, j) by a?~ 2 (3 < j < n). In this way, all the states in 
rows 4, 5, . . . , m — 3 can be reached. State (?n — 3, n — 2) goes to state (m — 2, 2) 
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Figure 6: Cross-product automaton for automata from Figure [5j where dashed- 
transitions are on input b, and unspecified transitions go to state (7,6). 

by b, and states (m — 2, j) with j > 3, except for state (m — 2, n — 1) that is 
reached from (2, n — 2) by a, are reached from states (m — 3, j — 1) by b. States 
(2,j) with j > 3, except for (2,n — 1), are reached from (m — 2,j — 1) by b. 
State (2, n — 2) goes to (3, 2) by b. From states in row m — 2 all reachable states 
in row m — 1 are reached by a. State (m, 2) is reached by b from (m — 1, re — 2); 
from here, all the other states is row m are reached by words in a* . 

For union, the three accepting states (m — 1, n — 1), (m — 1, n) and (m, n — 1) 
are equivalent. Consider the other reachable states. First, let p = and 
q = (k,£) be two rejecting states with i < k. We can use 6-transitions to get p 
into a state p' in row 3, and q into a state in a row i with i ^ 3. By a n , state 
goes to (3, n), while </ goes to (i, n). Now 6 m_2_3 a is accepted from (3, n) but 
not from (i, n). Next, let p and g be two distinct rejecting states in the same row. 
If they are in the last row, then a word in a* distinguishes them. Otherwise, we can 
get them into states (3, j) and (3,£) with j < I, using b- transitions. Now (S,j) 
accepts a n ~ 1 ~ J while (3,^) goes to the rejecting state (3,n). Finally, consider 
two distinct accepting states different from (?re — 1, n), (m, n — 1). By b, they go 
to two distinct rejecting, and so distinguishable, states. The proof for symmetric 
difference is similar, except that now state (m — 1, n — 1) is empty. □ 
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We now show that the upper bound for union of binary bifix-free languages is 
the same as the lower bound in the proposition above. 

Proposition 7 (Union: Binary Bifix-Free Languages; Upper Bound). Let 

m,n > 4 and let K and L be binary bifix-free languages with k(K) = m and 
k(L) = n. Then k(K U L) < ran — (m + n) — 2. 

Proof. Let K be a bifix-free language accepted by the quotient automaton A over 
{a, b} with states 1,2, ... ,m, where 1 is the initial state, m — 1 is the only ac- 
cepting state and it accepts only e, and m is the empty state. Let L be a similar 
language accepted by B with states 1,2, ... ,n, initial state 1, state n — 1 accepting 
e, and empty state n. 

Construct the corresponding cross-product automaton with states where 
i is a state of A and j is a state of B. In this cross-product automaton, we cannot 
go from columns n — 1 and re, as well as from rows m — 1 and m, back to any 
state (z , j) with i < m — 1 or j < n — 1. 

If state 1 of A goes by both inputs a and b to a state in {m — 1, m}, then no 
row i with i < m — 1 can be reached. Therefore, if the bound is to be met, at least 
one input, say a, takes state 1 to a state i with i < m — 1. Suppose also that b takes 
1 to a state in {m — 1, m}. A similar condition applies to L. Suppose that input b 
takes state 1 of B to a state j with j < n — 1, and a, to a state in {n — 1, n}. Then 
no state (i, j) with i < m — 1 or j < n — 1 can be reached. It follows that, without 
loss of generality, each automaton must take its initial state by a to a state that is 
neither accepting nor empty; for convenience, let this state be 2 in both automata. 
Then no other transition by a may go to state 2 in the two automata, otherwise 
they would not be suffix-free. 

It follows that in the cross-product automaton, all the states in row 2 and 
column 2, except for (2, 2), must be reached from some states by input b. Thus, 
if all the states are reachable, there must be an incoming transition by b to each 
state i with i > 2 in A and j with j > 2 in B. In particular, if state (m — 1, 2) 
or (2, n — 1) is reachable, then some state, say p\ (respectively q{) different from 
m — 1 (respectively n — 1) must go to state m — 1 (respectively n — 1) in A 
(respectively B). Now since p\ goes to m — 1 by b, it cannot go anywhere else by 
b. Thus there must be some other state p2 not in {pi,m — 1, m} that goes to p\ 
by b. Then there must be a state p% not in {p2,Pi,m — 1, m} that goes to p2 by b, 
and so on. Eventually, we have 

b b b b b b , b 

p m - 3 -> p m - 4 -> > p 3 ->• p 2 -> Pl -> m - 1 -> m, 

where all the states are pairwise distinct, and no state, except possibly state 1 , goes 
by b to state p m -z- 
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First assume state 1 goes to state p m -3 by 6. If p m -3 = 2, then state 1 goes to 
state 2 by a and by 6. This means that there is no other transition to state 2, and so 
row 2 is not reachable in the cross-product automaton. If p m -3 > 2 and 1 goes to 
p rn -3 by b, then no other state goes to p m -3 by b because of suffix-freeness, and 
so row p m _3 may only be reached by a's. However, in such a case state (p m -3, 2) 
is unreachable, since it is in row p m -3 that can be reached only by a's and at the 
same time in column 2 that can be reached only by 6's. 

Now assume that there is no transition by b going to state p m -3- If Pm-3 > 3, 
then (p m _3,2) is unreachable. If p m -3 = 2, then the whole row 2, except for 
(2, 2) is unreachable. The same considerations hold for automaton B. This gives 
the desired upper bound mn — (m + n) — 2. □ 

We finally consider union and symmetric difference of binary factor-free lan- 
guages, and give upper bounds. We conjecture that the bounds are tight. 

Proposition 8 (Union, Symmetric Difference: Binary Factor-Free Languages). 

Let m,n > 6. There exist binary factor-free languages K and L with k(K) = m 
and k(L) = n such that k(KUL), k{K®L) > mn— (m+n)— min{m— 3, n— 3}. 
We conjecture that this is largest bound for binary factor-free languages. 

Proof Consider binary languages K = a(b*a) m ~ 3 , and L = (a U b)(ba*) n ~ A b. 
Quotient automata for K and L are shown in Figure [7] 

To show that the languages are factor free, observe that every word w in K 
has exactly m — 2 a's, while every proper factor of w has less than m — 2 a's. 
Thus K is factor-free. For L, every word w in L either has a as a prefix and has 
n — 3 6's, or has n — 2 6's. However, every proper factor of w either has a as a 
prefix and has n — 4 6's, or has n — 3 6's. Thus L is also factor-free. 

Construct the cross-product automaton for language K U L; see Figured] 



b b b a,b 




Figure 7: Binary factor-free languages K and L meeting quotient complexity 
mn — (to + n) — (m — 3) for union and symmetric difference. 
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a 



Figure 8: Cross-product automaton for automata from Figure |7J m = 6, n = 7. 

Consider the following family 1Z of run — (m + n) — (m — 3) states: 

K = {(1,1), (2, 2)} U | 2 < i < m-2,3 < j < n} U 

{(m - j 3 < j < n - 2} U 
{(m,j) | 2 < j < n}, 

and let us show that all states in 1Z are reachable and pairwise distinguishable. The 
initial state (1, 1) goes to state (2, 2) by a, then to state (2, 3) by b, and then to 
state (i, j) with 2 < i < m — 2 and 3 < j < n by a l ~ 2 b^~ 3 . Each state (m — 2, j) 
with 3 < j < n — 2 goes to state (m — 1, j) by a. State (m, j) with 2 < j < n is 
reached from the initial state (1, 1) by V . Thus all the states in 1Z are reachable. 

For distinguishability, notice that a m ~ l is accepted only from state (1,1). 
Among the other states, two rejecting states in two distinct rows go to two distinc 
states in column n by b n , and the two states in column n are distinguished by a 
word in a*. Two rejecting states in the same row i go by a word in b* to states 
(i, n — 1) and (i, n) that are distinguished by e. Two distinct accepting states in 
family 1Z go by b either to two states, one of which is accepting and the other 
rejecting, or to two distinct rejecting, and so distinguishable, states. 
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The proof for symmetric difference is exactly the same; notice that the lan- 
guages are disjoint, and so their symmetric difference is the same as their union. 

Since union is a commutative operation, we may assume to < n, and then the 
lower bound for binary factor-free languages is mn — (m + n) — (m — 3). We 
did some computations by enumerating all the binary factor-free automata in the 
case of m, n < 6. The following table contains all the enumerated results: 



All the entries, except for 21 (to = n = 6), are the same as for binary bifix-free 
languages. In case m = n = 6, the complexity of union of binary factor-free 
languages is 21, that is mn — (to + n) — (to — 3). Thus it is the same as our lower 
bound. This is confirmed by the partial enumeration for to = 6 and n = 7, where 
we used a partial list of binary factor-free automata for n = 7. 

After quite a few unsuccessful attempts to get a larger value by the union of 
binary factor-free languages, we conjecture that mn — (m + n) — (to — 3) is an 
upper bound if to < n. □ 

We now turn our attention to subword-free languages. The next theorem gives 
tight bounds for all four boolean operations and shows that the bounds cannot be 
met using a fixed alphabet. 

Theorem 2 (Boolean Operations: Sub word-Free Languages). Let K and L 

be subword-free languages over an alphabet T, with k{K) = to and k(L) = n, 
where m,n > 4. Then 

1. Hi(KL)L), k(K(BL) < mn—{m+n), and the bound is tight if > m+n — 3; 

2. k(K n L) < mn — 3(m + n — 4), and the bound is tight if |S| > to + n — 7; 

3. k(K \ L) < mn — (2m + 3n — 9), and the bound is tight if |S| > to + n — 6. 
Moreover, the bounds cannot be met for smaller alphabets. 

Proof. Since subword-free languages are bifix-free, all the upper bounds apply. 
To prove tightness, let E = {a, b, c}L){di \ 3 < i < to— l}U{ej | 3 < j < n— 1}. 
Consider the languages K and L defined by the following quotient equations: 



m/n 



4 5 6 



4 
5 
6 



7 

10 13 

13 17 21 
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Figure 9: Sub word-free witness languages for boolean operations; m = 5, n = 6. 



K\ = (aUbU e 3 U---U en-!) K 2 U\JZ~ 3 1 diKi, 

Ki = aK i+ i U d i+ iK m -i i = 2, 3, . . . , m - 3, 

K m - 2 = (a U bU d m -i U e 3 U e 4 U • • • U e n _i)if m _i, 

= e> 

= 0, 

Ll = (aUcUd 3 U---U(i m _ 1 )L2UU"=3 1 e i L j , 

Lj = a-Lj+i U ej + \L n -i j = 2, 3, . . . , n — 3, 

L n _ 2 = (a U cU e n _i U d 3 U d 4 U • • • U d m _i)L n _i, 



Figure |9]shows the quotient automata for languages K and L if m = 5 and n = 6. 
All the omited transitions go to the empty states m and n. 

Let us show that languages K and L are subword-free. For this purpose, let 

T = {a, 6, e 3 ,e 4 , . . . ,e n _i}, and A = {d 3 ,d 4 , . . . ,cZ m _i}. 

Notice that no word in T* of length less than m — 2 is in K. Now let to be a word 
in language K. Then word u> either contains no letter from A, or contains at most 
two such letters. If w contains no letter from A, then w is a word in V* of length 
m — 2, and so no its proper subword is in K. If w contains exactly one letter from 
A, then either w = udi for some word u in T* of length i — 2, or w = diV for 
some word v in V* of length m — 1 — i. In both cases, no proper subword of u; is 
in language K. Finally, if w contains two letters from A, then w = dia k di + k+i 
where k > and 3<i<i + k + l<m — 2. No proper subword of such a 
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Figure 10: Reachability in the cross-product automaton for the union of languages 
from Figure [9] and transitions by b and c. 

word is in language K. This means that language K is subword-free. The proof 
for language L is similar. 

Figure [10] depicts the cross-product automaton of the dfa's for languages K 
and L defined in Figure |9j where we show only the transitions necessary to prove 
reachability and those caused by b and c. In the cross-product automaton, states in 
the first row and the first column, except for the initial state (1, 1), are unreachable. 
Now consider the remaining states. All the states in the second row and the second 
column are reached from (1, 1) by symbols in S. Each other state is reached from 
a state in the second row or second column by a word in a*. 

For union, all the states in row m — 1 and in column n — 1 are accepting, and 
the three states (m, n — 1), (m— 1, n — 1), and (m — 1, n) accept only e, and so are 
equivalent. These three states are distinguishable from all other accepting states, 
since each of the other accepting states accepts at least one non-empty word. Now 
let and (k, £) be two distinct states other than the three states accepting only 
word s. First assume that i < k. If i = m — 1, then state is accepting 
while state (k,£) is rejecting. If i < m — 2, then a m ~ 2 ~ l b is accepted from state 
(i, j), but not from state (k,£). Symmetrically, if j < £, then either e or a n ~ 2 ~i c 
distinguishes the two states. Therefore all the mn — (m + n) states are pairwise 
distinguishable. 

For symmetric difference, (m — 1, n — 1) is empty; the rest of the proof is the 
same as for union. 
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For intersection, the only accepting state is (m— 1, n— 1), and all the rejecting 
states in the last two rows and last two columns are empty. Next, the word a is 
accepted only from state (m — 2, n — 2), the word di (3 < i < n — 2) is accepted 
only from state (i — l,n — 2), while the word ej (3 < i < m — 2), only from 
state (m — 2,j — 1). This means that for each state there exists a word in 

a* (a U c?3 U • • • U <i m _2 U U • • • U e n _2) that is accepted only from So we 
get mn — 3(m + n — 4) pairwise distinguishable states. Notice, that here we do 
not use transitions by symbols b, c, d m -\, e n _i, and so we can simply omit these 
symbols to get witness languages over an alphabet of size m + n — 7. 

For difference, all the states in row m — 1, except for state (m — 1, n — 1), are 
accepting and accept e. All the states in the last row, as well as state (m — 1, n — 1), 
are empty, and states (z, n— 1) and (i, n) with 2 < i < m—2 are equivalent. States 
in different rows (up to row m — 1) are distinguished by a word in a*b. States in 
row m — 2 are distinguished by a word in a U e3 U e4 U • • • U e n _2 because a 
distinguishes states (m — 2, n — 2) and (m — 2, n — 1), and if2<j<£<n — 1 
and j ^ n — 2, then word e J+ i is not accepted from (m — 2,j) but is accepted 
from (m — 2,£). Next, states (i,n — 2) and (i,n — 1) with 2 < i < m — 3 
are distinguished by eZi+i. Finally, if two distinct states are in the same row, then 
there is a word in o*, by which the two states either go to two distinct states in 
row m — 2, or to two states (i, n — 2) and (i, n — 1) with 2 < i < m — 3. In both 
cases the resulting states are distinguishable, which proves the distinguishability 
of mn — (2m + 3n — 9) states. Notice that now we do not use transitions by 
c, d m -\ , e n _i, and so the bound is met for an alphabet of size m + n — 6. 

We now show that the upper bounds cannot be met using smaller alphabets. 
Let the quotients of K and L be K = K\, K2, . . . , K m _2, K m _\ = e, K m = 0, 
and L = L £ = Li, L2, ■ ■ ■ , ^ n -2 5 ^n-i = £, L n = 0, ordered as in Proposi- 
tion [3] By Lemma [T] all the quotients of the form K2 U Lj or Kj U L2 must be 
reached by letters if the bound is to hold, and this is impossible if the size of the 
alphabet is smaller than the number of such quotients. □ 

5 Product and Star 

The complexity of product of prefix-free languages is m + n — 2 lfT2l . For suffix- 
free languages, the complexity is (m — l)2 n_1 + 1 ifTTI . Since bifix-free languages 
are prefix-free, and the witness prefix-free languages a m ~ 2 and d n ~ 2 are also 
subword-free, and we have the following result. 

Theorem 3 (Product). If K and L are bifix-free with k(K) = m and k(L) = n, 
where m,n > 2, then k(KL) < m + n — 2. Furthermore, there are unary 
subword-free languages that meet this bound. 
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The complexity of star is n for prefix-free languages lfT2l . and 2 n ~ 2 + 1 for 
suffix-free languages ifTTTl . We now extend these results to bifix-, factor-, and 
subword-free languages. The quotient of L* by e is L* = e U LL*, and the 
following formula holds for a quotient of L* by a non-empty word w : 

= (L W U |J (L*) £ U L V )L*. 

w=uv 

Theorem 4 (Star). TfL bifix-free with k(L) = n, where n > 3, then k(L*) < 
n—1. Furthermore, there are binary subword-free languages that meet this bound. 

Proof. Assume that L is bifix-free. Then it is prefix-free, has only one accepting 
quotient, namely e, and has the empty quotient, by Proposition Q] Moreover, since 
L is suffix-free, the quotient L is uniquely reachable by e, by Proposition |2] 

Let L w be a non-empty quotient of L by a non-empty word w. Let us show that 
(L*) e u = for every proper non-empty prefix u of w. Assume for contradiction 
that e G {L*) u , where w = uv for some non-empty words u and v. Then u € L*, 
and so there exist words x in L and y in L* such that u = xy. This gives L w = 
L xy v = £yv = because x G L implies L x = e. This is a contradiction, and so 
we must have (L*) £ u = 0. Hence, if L w is non-empty, then (L*) w = L W L* , by 
the equation above. Now if L w is accepting, then L w = e, and so (L*) w = L* = 
(L*) e . There are n — 2 choices for rejecting and non-empty quotients L w . But, 
for a non-empty word w, we have L w ^ L since L is uniquely reachable by e. 
This reduces the number of choices to n — 3 (since we have n > 3). If L w = 0, 
then by the observation above, (L*) w = (L*) e u L v L* , where w = uv and u is the 
shortest word such that L v ^ 0. Such a quotient is either empty or has already 
been counted. In total, there are at most n — 1 quotients of L*. 

The subword-free language a™~ 2 over the alphabet {a, b} meets the bound 
since the language (a n ~ 2 )* has n — 2 quotients of the form a n ~ 2 ~ l (a n ~ 2 )* for 
i = 1, 2, . . . , n — 2, and it has the empty quotient, for a total of n — 1. □ 

6 Reversal 

The last operation we consider is reversal. In ifTTl [T21 it was shown that the com- 
plexity of reversal is 2 n ~ 2 + 1 for suffix-free or prefix-free languages. We show 
that this bound can be reduced for bifix-free languages. We use the standard 
method of reversing the quotient dfa V of L to obtain an nfa for L R , and then 
we use subset construction to find the dfa V R for L R . 
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Figure 1 1 : The ternary factor-free language meeting the 2 n 3 + 2 bound for re- 
versal. 

Theorem 5 (Reversal: Bifix- and Factor-Free Languages). If L is a bifix-free 
language with k(L) = n, where n > 3, then k(L r ) < 2 n ~ 3 + 2. Moreover, there 
exist ternary factor-free languages that meet this bound. 

Proof. If L is bifix-free, then so is L R . Since L is prefix-free, it has exactly one 
accepting quotient, e, and also has the empty quotient. 

Consider the quotient automaton V for L, and remove the empty quotient and 
all the transitions to the empty quotient. Reverse this incomplete dfa to get an 
(n — l)-state nfa N for L R . Apply the subset construction to N to get a dfa V R 
for L R . The initial state of dfa V R is the singleton set {/}, where / is the e 
quotient in quotient automaton V. No other subset containing state / is reachable 
in V R since no transition goes to state / in nfa J\f. This gives at most 2 n ~ 2 + 1 
reachable states. However, language L R is prefix-free, and so all the accepting 
states of V R accept only the empty word, and can be merged into one state. Hence 
k(L r ) < 2 n ~ 3 + 2. 

If n = 3 or n = 4, then factor-free languages a and aa, respectively, meet the 
bounds. 

If n > 5, then consider the language L = cKc, where K is a regular language 
over the alphabet {a, b} with k(K ) = n — 3 meeting the upper bound 2 n_3 for 
reversal l24l . The quotient automaton of L without the empty state is shown in 
Figure [TT] 

By Proposition HJ language L is factor-free, and k(L) = n. Since k(K r ) = 
2 n_3 , there exists a set S of 2 n_3 words over {a, b} that define distinct quotients 
of language K R . Then the quotients of cK R c by 2 n_3 + 2 words e, cw with 
w € S, and cue for some word u in K R , are distinct as well. This gives k(L r ) = 
2 n ~ 3 + 2. □ 

Theorem 6 (Reversal: Sub word-Free Languages). If L is a subword-free lan- 
guage over an alphabet X with k(L) = n, where n > 4, then k(L r ) < 2 n ~ 3 + 2. 
The bound is tight if\T,\ > 2 n_3 — 1, but cannot be met for smaller alphabets. 
The bound cannot be met if L contains a word of length at least 3. 
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Proof. Suppose L is a subword-free language such that n{L R ) = 2 n ~ 3 + 2. Let 
V = (Q, £, 6, s, /) be the quotient dfa of L with Q = {s,qi, . . . , q n -3, f, e} as 
the state set, where e and / correspond to the quotients and s. Construct a dfa 
V R for L /? as in the proof of Theorem^ If k(L r ) = 2 n ~ 3 + 2, then the state 
{qi, q 2 , . . . , g n -3} must be reachable. Therefore there must exist a non-empty 
word v such that, for all qi, we have 5(qi,v) = f. Now suppose there exists a 
word w in L such that |u>| > 2. Let u) = a&x where a, 6 G E and x € £ + . Also 
suppose 5(s,a) = qi and 5(qi,b) = q,y Then we have av,abv € L, showing 
that L is not subword-free, which is a contradiction. Hence, if any word in L has 
length at least 3, then k(L r ) < 2 n ~ 3 + 2. Now note that, if all the words in L 
have length at most 2, the only possible quotients of L R are L R , (L R ) a for all 
a G E, e, and 0. Therefore k(L r ) < |E| + 3, and the second claim follows. 

Now consider tightness. If n = 3, then the bound is met by the unary subword- 
free language a. Let n > 4 and £ = 2 n_3 — 1. Also let £ = {ai, 02, . . . , a^}, and 
let Si, S2, ■ ■ ■ , Si be all the non-empty subsets of {1, 2, . . . , n — 3}. Now let 

lR = a i( U u a2 ( U °i) u • • • u U a j)- 

j'eSi J6S2 j'eSf 

Since L R only contains two-letter words, languages L R and L are subword-free. 
The quotients of L R are L R , (L R ) ai = Uje5 a j i° T i = 1,2, . . . e, and 0. 
Therefore k(L r ) = I + 3 = 2 n ~ 3 + 2. But for L, the only possible and distinct 
quotients are L, L ai for i = 1, 2, . . . , n — 3, e, and 0. Thus = re. □ 

7 Conclusions 

Our results are summarized in Tables Q] and |2l where "B-, F-free" stands for bifix- 
free and factor-free, and "S-free" for subword-free. The bounds for operations 
on prefix-free languages are from |[TTl[T3l . for operations on suffix-free languages 
from (9j[l2j[l4l, and those for regular languages, from |[T6l[T7ll27l . For languages 
over a unary alphabet E = {a}, the concepts prefix-, suffix-, factor-, and subword- 
free coincide, and L is free with k{L) = n if and only if L = {d n ~ 2 }. 

In the case of subword-free languages the size of the alphabet cannot be de- 
creased. In the other cases, whenever the size of the alphabet is greater than 2, we 
do not know whether or not the bounds are tight for smaller alphabets. 

The fact that our bounds usually apply only when m,n > 3 is not a limi- 
tation, since bifix-free languages with smaller quotient complexities are simple. 
For n = 1, we have only 0, for n = 2, only e, and for n = 3, a subset of E. 
The complexities of operations on such languages can be computed directly. 
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KUL,K®L |E| 


if n L |S| 


X\L |S| 


free unary 


max(m, n) 


m = n 


m 


prefix 


ran — 2 2 


mn — 2(m + n — 3) 2 


mn — (m + 2n — 4) 2 


suffix 


mn — (m + n — 2) 2 


mn — 2(m + n — 3) 2 


mn — (m + 2n — 4) 2 


B-, F-free 


mn. — (m + n) 3 


mn — 3(m + n — 4) 2 


mn - (2m + 3n - 9) 2 


S-free 


mn. — (m + n) si 


mn — 3(m + n — 4) S2 


mn — (2m + 3n — 9) S3 


regular 


mn 2 


mn 2 


mn 2 



Table 1: Complexities of boolean operations on free languages; si = m + n — 3, 

S2 = m + n — 7, S3 = m + n — 6. 





KL |S| 


L* |S 


L« |E| 


free unary 


m + n — 2 


n 


2"" 2 + 1 


prefix-free 


m + n — 2 1 


n 2 


2 n " 2 + 1 3 


suffix-free 


(m-l)2"- 1 + l 3 


2 n-2 _|_ 1 2 


2"- 2 + 1 3 


B-, F-free 


m + n — 2 1 


n- 1 2 


2«-3 + 2 3 


S-free 


m + n — 2 1 


n - 1 2 


2 n-3 _|_ 2 2™~ 3 - 1 


regular 


(2m-l)2"" 1 2 


2™- 1 + 2 n ~ 2 2 


2 n 2 



Table 2: Complexities of product, star, and reversal of free languages. 
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